fix(declarative): Support nested $ref patterns in dynamic stream templates (do not merge)#708
Conversation
- Add generic object support to stream_template field in DynamicDeclarativeStream - Add generic object support to requester field in SimpleRetriever schemas - Add generic object support to stream field in ParentStreamConfig - Fixes validation regression introduced in PR #560 (commit f52bc2e) - Enables Airtable connector validation while maintaining backward compatibility - Allows with additional properties (e.g. path, http_method) Resolves validation issue: 'Validation against json schema defined in declarative_component_schema.yaml schema failed' Co-Authored-By: AJ Steers <aj@airbyte.io>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1723407187-fix-dynamic-stream-template-validation#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1723407187-fix-dynamic-stream-template-validationHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a schema validation regression that prevented the Airtable connector from validating against the declarative component schema. The issue was caused by overly restrictive validation that only accepted direct $ref patterns, breaking Airtable's nested reference pattern.
- Added generic
$refobject support to multiple schema fields to allow nested reference patterns - Maintained backward compatibility while enabling more flexible reference patterns
- Fixed validation errors that prevented Airtable connector from working in the Connector Builder UI
| type: string | ||
| pattern: "^#/definitions/" | ||
| required: ["$ref"] | ||
| additionalProperties: true |
There was a problem hiding this comment.
Setting additionalProperties: true allows any additional properties beyond $ref, which could lead to unexpected behavior or security issues. Consider restricting this to only allow known safe properties or set to false if additional properties aren't necessary.
| additionalProperties: true | |
| additionalProperties: false |
| properties: | ||
| $ref: | ||
| type: string | ||
| pattern: "^#/definitions/" |
There was a problem hiding this comment.
The regex pattern ^#/definitions/ only validates the prefix but doesn't prevent potentially malicious or malformed references like #/definitions/../../../sensitive/path. Consider using a more restrictive pattern that validates the complete reference structure.
| pattern: "^#/definitions/" | |
| pattern: "^#/definitions/[A-Za-z0-9_-]+$" |
📝 WalkthroughWalkthroughExpanded the declarative component schema to accept an inline object wrapper containing a $ref to definitions in six anyOf locations, alongside existing direct $ref usage. No runtime or control-flow changes; only schema validation broadened to allow both reference shapes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (6)
1511-1517: Broadened retriever ref shape looks good; consider de-duplicating the wrapper schema.Allowing an inline object with a local "$ref" plus extra fields is spot on for nested and embellished refs, and the pattern anchor restricts external refs. Would you consider extracting this wrapper into a shared definition (e.g., LocalManifestRefObject) to avoid repeating the same schema in six places and prevent future drift, wdyt?
Example:
+ LocalManifestRefObject: + description: Inline wrapper for a local manifest reference under the top-level 'definitions' of the manifest. + type: object + properties: + $ref: + type: string + pattern: "^#/definitions/" + required: ["$ref"] + additionalProperties: trueThen here:
- - type: object - properties: - $ref: - type: string - pattern: "^#/definitions/" - required: ["$ref"] - additionalProperties: true + - "$ref": "#/definitions/LocalManifestRefObject"
2424-2430: LGTM; same inline $ref wrapper for DynamicSchemaLoader.retriever.This mirrors the earlier change and maintains consistency. Would you also swap this repeated block for the shared LocalManifestRefObject if you adopt the de-dup refactor above, wdyt?
3231-3237: LGTM; ParentStreamConfig.stream now supports the wrapper.This unblocks nested parent stream refs while keeping local-only restriction. If you adopt the shared definition, would you replace this block as well, wdyt?
3678-3684: LGTM; SimpleRetriever.requester now supports wrapper refs.Nice for reuse across retriever shapes. Same de-duplication suggestion applies here if you add a LocalManifestRefObject, wdyt?
4233-4239: LGTM; HttpComponentsResolver.retriever updated consistently.Good consistency across components. Would you also add a brief description to the wrapper definition (or to the shared one) clarifying it points to the manifest’s top-level 'definitions' (not JSON Schema defs), wdyt?
4374-4380: Fix for DynamicDeclarativeStream.stream_template looks correct; please add tests for regression coverage.This addresses the regression for nested stream templates while preventing external refs; could you add/extend tests to cover:
- passing cases: direct "$ref" and wrapper form,
- wrapper with extra fields (e.g., path/http_method) accepted,
- rejection of external refs (http://, file://),
- optionally a negative case where the wrapper points to a non-stream definition to document expected behavior?
Happy to draft test cases if helpful, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: destination-motherduck
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
fix(declarative): Support nested $ref patterns in dynamic stream templates
Summary
This PR fixes a schema validation regression introduced in PR #560 (commit f52bc2e) that prevented the Airtable connector from validating against the declarative component schema. The issue occurred when the
stream_templatefield inDynamicDeclarativeStreamwas changed from a simple$refto ananyOfstructure that only accepted direct references toDeclarativeStreamorStateDelegatingStream, breaking Airtable's nested reference pattern{"$ref": "#/definitions/streams/airtable_stream"}.Key Changes:
$refobject support tostream_templatefield inDynamicDeclarativeStream$refobject support torequesterfield in multiple SimpleRetriever schema definitions$refobject support tostreamfield inParentStreamConfig$refobjects with additional properties (e.g.,path,http_method) as used by AirtableThe fix maintains backward compatibility while enabling more flexible reference patterns needed by dynamic stream connectors like Airtable.
Review & Testing Checklist for Human
$refpattern support doesn't break existing connectors that use direct references^#/definitions/pattern prevents invalid external references while allowing valid nested referencesRecommended Test Plan:
$refpatternsDiagram
%%{ init : { "theme" : "default" }}%% graph TD Airtable["source-airtable/manifest.yaml"]:::context Schema["declarative_component_schema.yaml"]:::major-edit Builder["Connector Builder UI"]:::context Airtable --> |"validates against"| Schema Schema --> |"enables validation in"| Builder subgraph "Schema Changes" StreamTemplate["stream_template field<br/>(DynamicDeclarativeStream)"]:::major-edit RequesterField["requester field<br/>(SimpleRetriever)"]:::major-edit StreamField["stream field<br/>(ParentStreamConfig)"]:::major-edit end Schema --> StreamTemplate Schema --> RequesterField Schema --> StreamField subgraph Legend L1["Major Edit"]:::major-edit L3["Context/No Edit"]:::context end classDef major-edit fill:#90EE90 classDef context fill:#FFFFFFNotes
$refpattern#/definitions/streams/airtable_streamSession Details:
Summary by CodeRabbit